Skip to content

Merge it#1

Open
santyr wants to merge 244 commits intohexdaemon:mainfrom
lightning-goats:main
Open

Merge it#1
santyr wants to merge 244 commits intohexdaemon:mainfrom
lightning-goats:main

Conversation

@santyr
Copy link

@santyr santyr commented Feb 6, 2026

No description provided.

Copy link
Owner

@hexdaemon hexdaemon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Approving.

santyr and others added 29 commits February 6, 2026 08:57
…mpotency

Protocol hardening: version tolerance + deterministic idempotency
…mpotency

Protocol hardening: Phases B+C+D (version tolerance, idempotency, reliable delivery)
…on-idempotency

Hardening: Phase B/C protocol versioning, idempotency & bug fixes
Added an image to enhance the article's visual appeal.
…on-idempotency

Comprehensive hardening: P0/P1 bug fixes, thread safety, security
…on-idempotency

fix: P2/P3 hardening across 13 modules
contribution_ratio was never synced from the ledger to hive_members,
last_seen only updated on connect/disconnect events, and addresses
were never captured at join time. This fixes all three root causes
plus initializes presence tracking at join so uptime_pct accumulates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The hive-status RPC only returned tier/joined_at/pubkey for our membership,
so cl-revenue-ops revenue-hive-status showed null for these fields (Issue #36).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ats-addresses

fix: resolve stale member stats and null addresses (#59, #60)
…mat, determinism, dedup

- Bug 1 (Critical): calculate_our_balance now uses identical MemberContribution
  conversion as compute_settlement_plan (proper uptime normalization, int casting,
  rebalance_costs inclusion)
- Bug 2 (Critical): Period format standardized to YYYY-WW across routing_pool.py
  and rpc_commands.py (was YYYY-WNN, mismatched settlement format)
- Bug 3: settle_period atomicity check changed from `if ok is False` to `if not ok`
  to catch None/0 returns from record_pool_distribution
- Bug 4: generate_payments sort now includes peer_id tie-breaker for deterministic
  payment ordering, matching generate_payment_plan
- Bug 5: capital_score now reflects weighted_capacity instead of uptime_pct
- Bug 6: asyncio event loop in settlement_loop wrapped in try/finally to ensure
  loop.close() on exceptions
- Bug 8: Revenue deduplication by payment_hash (application-level check + UNIQUE
  constraint + index on pool_revenue table)
- Bug 9: Removed snapshot_contributions() side-effects from read-only paths
  (get_pool_status, calculate_distribution)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ification, signed ACKs

CRITICAL:
- Add ban check to handle_hello/handle_attest (prevents ban evasion via rejoin)
- Add timestamp freshness checks to 23 message handlers with per-type age limits
  (GOSSIP 1hr, INTENT 10min, SETTLEMENT 24hr, INTELLIGENCE 2hr)
- 5-minute future clock skew tolerance

HIGH:
- Add cryptographic signature verification to 13 previously unsigned handlers
  (health_report, liquidity_need/snapshot, route_probe/batch,
   peer_reputation_snapshot, task_request/response, splice_init_request/response,
   splice_update/signed/abort)
- MSG_ACK now signed: create_msg_ack accepts rpc for signing,
  handle_msg_ack verifies signature (backward-compatible)

MODERATE:
- Increase relay dedup window from 300s to 3600s (covers freshness windows)
- Increase MAX_SEEN_MESSAGES from 10000 to 50000

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CRITICAL: Replace 9 unsafe plugin.rpc calls with safe_plugin.rpc
- handle_expansion_nominate/elect/decline: checkmessage() and getinfo()
- hive_calculate_size: listchannels() and listfunds()
- hive_test_intent: getinfo()
- hive_test_pending_action: listchannels() and getinfo()

These bypassed the RPC_LOCK thread serialization, risking race conditions
when background threads make concurrent RPC calls to lightningd.

CRITICAL: Fix direct dict access on RPC results
- init(): getinfo()['id'] → getinfo().get('id', '') — could crash startup
- hive_test_intent: getinfo()['id'] → .get('id', '')
- hive_test_pending_action: getinfo()['id'] → .get('id', '')
- member_ids set comprehension: m['peer_id'] → m.get('peer_id', '')

HIGH: Wrap unprotected signmessage vote signing in try-except
- _propose_settlement_gaming_ban: vote signing had no error handling
- hive_propose_ban: vote signing had no error handling
Both could crash if signmessage RPC fails after proposal creation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…safety

- strategic_positioning: fix AttributeError crashes (fleet_coverage, target_capacity_sats, value_score → correct attribute names)
- cooperative_expansion: fix TOCTOU in join_remote_round (atomic check-and-set), negative liquidity score (clamp to 0), deterministic election tie-breaker (peer_id), use-after-free in handle_decline (capture decline_count in local), state validation in handle_elect, prune unbounded _recent_opens/_target_cooldowns
- governance: add threading.Lock for failsafe budget TOCTOU race (atomic check-execute-update)
- settlement: cap remainder allocation to len(frac_order) preventing cyclic wrapping
- bridge: fix double record_failure() on timeout (subprocess.TimeoutExpired → TimeoutError chain)
- liquidity_coordinator: fix MCF assignment ID collision (include channel suffixes)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove trustedcoin plugin (explorer-only Bitcoin backend)
- Add vitality plugin v0.4.5 for plugin health monitoring
- Update Docker image version to 2.2.7
- vitality auto-restarts failed plugins, improving production uptime

Ref: lightning-goats/cl-hive
…orrectness

P0 crashes fixed:
- channel_rationalization: _get_topology_snapshot() → get_topology_snapshot()
- network_metrics: same AttributeError crash on nonexistent private method
- fee_coordination: TypeError when TemporalPattern.hour_of_day/day_of_week is None
- task_manager: crash on None target/amount_sats in _execute_expand_task

P1 logic errors fixed:
- channel_rationalization: self.analyzer → self.rationalizer.redundancy_analyzer
- channel_rationalization: r.owner_id → r.owner_member, r.freed_capacity_sats → r.freed_capital_sats
- channel_rationalization: self.our_pubkey → self._our_pubkey
- fee_coordination: day_of_week == -1 → is None for pattern matching
- planner: listpeerchannels(target) → listpeerchannels(id=target)
- planner: guard for None return from create_intent before accessing .intent_id
- yield_metrics: net_revenue now subtracts total_cost (including open_cost) not just rebalance_cost
- routing_intelligence: int() wrap on float avg_capacity_sats to match type annotation
- mcf_solver: reverse edges now properly filtered via is_reverse flag instead of cost_ppm < 0

P2 edge cases fixed:
- mcf_solver: solution_valid false when no solution exists (was reporting true)
- peer_reputation: force_close_count uses max() not sum() across reporters
- peer_reputation: filter None from unique_reporters set
- network_metrics: use hive_connections not external topology for "not connected to"
- yield_metrics: clamp depletion_risk and saturation_risk to [0, 1.0]
- yield_metrics: init _remote_yield_metrics in __init__ instead of hasattr
- channel_rationalization: init _remote_coverage/_remote_close_proposals in __init__
- channel_rationalization: guard ZeroDivisionError on empty topology
- health_aggregator: round() instead of int() for health score truncation
- planner: clamp negative ratio in channel size calculation
- fee_coordination: min strength floor (0.1) for route markers preserving failure signal
- fee_intelligence: filter None from reporters list
- quality_scorer: Tuple[bool, str] type hint for Python 3.8 compat

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add threading.Lock to AdaptiveFeeController, StigmergicCoordinator,
  MyceliumDefenseSystem, TimeBasedFeeAdjuster, FeeCoordinationManager
  to protect shared state from concurrent modification
- Add threading.Lock to VPNTransportManager with snapshot-swap pattern
  for atomic reconfiguration and protected stats/peer state
- Route task_manager._execute_expand_task through governance engine
  instead of directly calling rpc.fundchannel (security: fail closed)
- Fix outbox retry: parse/serialize errors now fail permanently instead
  of retrying indefinitely with backoff
- Add cache bounds: cap _remote_pheromones (500 peers),
  _markers (1000 routes), _peer_stats (500 peers),
  _remote_yield_metrics (200 peers), _flow_history (500 channels)
- Add stale key eviction to rate limiters in peer_reputation,
  routing_intelligence, liquidity_coordinator, task_manager

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-revenue-ops

5 bugs fixed in the cooperative fee coordination flow:

- Non-salient fee changes now correctly revert to current_fee (was returning
  the modified fee even when salience filter said "not worth changing")
- pheromone_levels RPC now returns list under "pheromone_levels" key with
  field names matching cl-revenue-ops expectations (level, above_threshold)
- New hive-record-routing-outcome RPC for pheromone updates when
  source/destination are unavailable (fallback was calling read-only
  hive-pheromone-levels with invalid write params)
- Health multiplier comments corrected to match actual math ranges

These bugs combined meant the pheromone-based adaptive fee learning signal
was completely non-functional — routing outcomes were never recorded as
pheromone updates, and pheromone levels were unreadable by cl-revenue-ops.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ting, MCF

Critical fixes:
- CircularFlow.cycle → CircularFlow.members: AttributeError crash in
  get_shareable_circular_flows and get_all_circular_flow_alerts
- BFS fleet path finding used shared external peers as connectivity proxy
  instead of checking actual direct channels between members (phantom routes)
- LiquidityCoordinator._lock defined but never acquired — all shared
  mutable state unprotected from concurrent access

Medium fixes:
- MCFCircuitBreaker not thread-safe (added threading.Lock)
- MCF get_total_demand only counted inbound needs — fleets with only
  outbound needs never triggered optimization
- receive_mcf_assignment could exceed MAX_MCF_ASSIGNMENTS if cleanup
  didn't free space (now rejects)
- Empty string peers from failed channel lookups polluted circular
  flow detection graph
- to_us_msat not converted to int before comparison (Msat type safety)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… encapsulation

- create_mcf_ack_message() called with 4 extra args (TypeError on every ACK)
- create_mcf_completion_message() called with 7 extra args (TypeError on every completion)
- ctx.state_manager AttributeError in rebalance_hubs/rebalance_path (safe getattr)
- execute_hive_circular_rebalance missing permission check for fund movements
- get_mcf_optimized_path ignoring to_channel parameter (wrong assignment match)
- _check_stuck_mcf_assignments reaching into private dict (encapsulated with lock)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…defensive copies

State Manager:
- _validate_state_entry() no longer silently mutates input dict (available > capacity now rejected)
- update_peer_state() makes defensive copies of fee_policy, topology, capabilities
- Caps available_sats at capacity_sats in update_peer_state()
- load_from_database() and _load_state_from_db() now use from_dict() for consistent field handling

Planner:
- Added missing feerate gate to _propose_expansion() (documented but never implemented)
- Fixed cfg.market_share_cap_pct crash → getattr(cfg, 'market_share_cap_pct', 0.20)
- Fixed cfg.governance_mode crash → getattr(cfg, 'governance_mode', 'advisor')

Gossip:
- Added timestamp freshness check: rejects messages >1hr old or >5min in future

23 new tests, 1225 total passing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…te_entry()

Prevents unbounded arrays, non-string entries, and oversized capability
strings from being accepted via gossip or FULL_SYNC messages.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…gnments()

Added get_all_assignments() method to LiquidityCoordinator and updated
the mcf_assignments RPC to use it instead of reaching into private dict.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add vitality-amboss=true to docker-entrypoint.sh config generation
- Add vitality-watch-channels=true for channel health monitoring
- Add vitality-expiring-htlcs=50 for HTLC expiry warnings
- Update Dockerfile comment to document Amboss integration
…AttributeError

Critical fixes across 5 modules:

- mcf_solver: MCFCircuitBreaker.get_status() race condition — can_execute()
  called outside lock returned stale value; refactored to _can_execute_unlocked()
  called atomically within lock
- liquidity_coordinator: 8 thread safety fixes — missing locks on get_status(),
  get_pending_mcf_assignments(), get_mcf_assignment(), update_mcf_assignment_status(),
  create_mcf_ack_message(), create_mcf_completion_message(), get_mcf_status();
  deadlock fix (non-reentrant lock + nested call); new claim_pending_assignment()
  atomic method to prevent TOCTOU double-claim race
- cl-hive.py: _send_mcf_ack() TypeError — create_mcf_ack_message() takes no
  params but was called with 4 positional args; sendcustommsg keyword args fix;
  broadcast_intent_abort NameError (plugin → safe_plugin); missing coordinator
  check in handle_mcf_completion_report; TOCTOU claim race replaced with atomic
  claim_pending_assignment()
- cost_reduction: CircularFlow AttributeError (cf.members_count → cf.cycle_count);
  hub scoring division-by-zero guard; record_mcf_ack() thread safety with
  dedicated lock and proper __init__ initialization
- intent_manager: get_intent_stats() race — _remote_intents read without lock

25 new tests covering all fixes including concurrent access verification.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
hexdaemon and others added 30 commits February 25, 2026 09:42
- H-1: Move listfunds() RPC call outside _channel_peer_cache_lock
- H-2: Add _rate_lock to protect _probe_rate/_batch_rate dicts in routing_intelligence
- M-6: Log full traceback in message dispatcher exception handler

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
D2: cleanup_expired_intents() performs UPDATE + DELETE without a
transaction, risking orphaned records on crash. Wrap in BEGIN IMMEDIATE.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The eviction used min-by-timestamp which is unpredictable under
clock skew or same-second updates. Use dict insertion order (FIFO)
which correctly evicts the oldest tracked peer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… more

HIGH: Fee broadcast tracking no longer resets on non-broadcast path,
2-member hive quorum no longer requires impossible 3 votes, handle_welcome
no longer trusts remote tier or adds peer as full member, channel existence
check fails-closed on RPC error, hive-close-channel uses correct param name.

MEDIUM: Safe amount_msat parsing, max_channel_sats enforcement, market_share
lower bound, contribution ledger TOCTOU, shutdown ordering, dead config
annotation, signed bootstrap promotion vouch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants